Overview

Dataset statistics

Number of variables9
Number of observations710
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory50.0 KiB
Average record size in memory72.2 B

Variable types

DateTime1
Numeric6
Categorical2

Alerts

HML is highly correlated with Mkt_RF and 5 other fieldsHigh correlation
CMA is highly correlated with Mkt_RF and 1 other fieldsHigh correlation
Mkt_RF is highly correlated with SMB and 4 other fieldsHigh correlation
SMB is highly correlated with Mkt_RF and 2 other fieldsHigh correlation
RMW is highly correlated with SMB and 1 other fieldsHigh correlation
Best is highly correlated with Mkt_RF and 2 other fieldsHigh correlation
Worst is highly correlated with Mkt_RF and 2 other fieldsHigh correlation
Date has unique values Unique
RF has 69 (9.7%) zeros Zeros

Reproduction

Analysis started2022-10-11 04:47:49.656335
Analysis finished2022-10-11 04:47:51.701924
Duration2.05 seconds
Software versionpandas-profiling v3.3.0
Download configurationconfig.json

Variables

Date
Date

UNIQUE

Distinct710
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
Minimum1963-07-01 00:00:00
Maximum2022-08-01 00:00:00
2022-10-11T00:47:51.737279image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:51.789898image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Mkt_RF
Real number (ℝ)

HIGH CORRELATION

Distinct566
Distinct (%)79.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.004547425339
Minimum-0.2644865148
Maximum0.1492817027
Zeros1
Zeros (%)0.1%
Negative285
Negative (%)40.1%
Memory size5.7 KiB
2022-10-11T00:47:51.846087image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.2644865148
5-th percentile-0.07512775864
Q1-0.01987113062
median0.009108391137
Q30.03343477609
95-th percentile0.06849466003
Maximum0.1492817027
Range0.4137682176
Interquartile range (IQR)0.0533059067

Descriptive statistics

Standard deviation0.04521785727
Coefficient of variation (CV)9.943617301
Kurtosis2.621816834
Mean0.004547425339
Median Absolute Deviation (MAD)0.02676332638
Skewness-0.7615005447
Sum3.228671991
Variance0.002044654616
MonotonicityNot monotonic
2022-10-11T00:47:51.892184image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.013896105193
 
0.4%
-0.01450468623
 
0.4%
0.010247316453
 
0.4%
0.013902905173
 
0.4%
0.0077697372643
 
0.4%
-0.023166278033
 
0.4%
0.067004228783
 
0.4%
0.01419871943
 
0.4%
0.030626193543
 
0.4%
0.020390689653
 
0.4%
Other values (556)680
95.8%
ValueCountFrequency (%)
-0.26448651481
0.1%
-0.18910450911
0.1%
-0.17530622191
0.1%
-0.14375490361
0.1%
-0.13811330211
0.1%
-0.13639262491
0.1%
-0.12681116691
0.1%
-0.12522314481
0.1%
-0.11653381631
0.1%
-0.11339268741
0.1%
ValueCountFrequency (%)
0.14928170271
0.1%
0.12804134991
0.1%
0.12795336431
0.1%
0.11751633342
0.3%
0.11475623731
0.1%
0.10750820771
0.1%
0.10705907231
0.1%
0.10562048191
0.1%
0.1029175341
0.1%
0.097852400171
0.1%

SMB
Real number (ℝ)

HIGH CORRELATION

Distinct510
Distinct (%)71.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.001815165953
Minimum-0.1666450774
Maximum0.1683916513
Zeros2
Zeros (%)0.3%
Negative340
Negative (%)47.9%
Memory size5.7 KiB
2022-10-11T00:47:51.939839image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.1666450774
5-th percentile-0.04390486804
Q1-0.01529131954
median0.0009995003331
Q30.02014570209
95-th percentile0.04797070899
Maximum0.1683916513
Range0.3350367287
Interquartile range (IQR)0.03543702163

Descriptive statistics

Standard deviation0.03009384567
Coefficient of variation (CV)16.57911533
Kurtosis2.955098793
Mean0.001815165953
Median Absolute Deviation (MAD)0.01787131921
Skewness0.1197920192
Sum1.288767826
Variance0.0009056395471
MonotonicityNot monotonic
2022-10-11T00:47:51.986065image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0012991557325
 
0.7%
0.026739297194
 
0.6%
-0.013997509644
 
0.6%
-0.011465478114
 
0.6%
0.0030952049074
 
0.6%
-0.0062192998144
 
0.6%
-0.0041084280453
 
0.4%
0.019018005843
 
0.4%
-0.010757656653
 
0.4%
-0.00050012504173
 
0.4%
Other values (500)673
94.8%
ValueCountFrequency (%)
-0.16664507741
0.1%
-0.10558276261
0.1%
-0.086756863931
0.1%
-0.084142768111
0.1%
-0.075585986961
0.1%
-0.071818287791
0.1%
-0.071603418861
0.1%
-0.07063707961
0.1%
-0.066674133271
0.1%
-0.06646036671
0.1%
ValueCountFrequency (%)
0.16839165131
0.1%
0.12142085521
0.1%
0.099030523461
0.1%
0.09467361361
0.1%
0.087827710371
0.1%
0.087094706851
0.1%
0.081672148641
0.1%
0.076868444261
0.1%
0.073343394221
0.1%
0.072692685391
0.1%

HML
Real number (ℝ)

HIGH CORRELATION

Distinct498
Distinct (%)70.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.002539667261
Minimum-0.1504741134
Maximum0.1200027924
Zeros0
Zeros (%)0.0%
Negative326
Negative (%)45.9%
Memory size5.7 KiB
2022-10-11T00:47:52.036448image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.1504741134
5-th percentile-0.0418642041
Q1-0.01397215853
median0.002446992448
Q30.01734863833
95-th percentile0.05260193328
Maximum0.1200027924
Range0.2704769057
Interquartile range (IQR)0.03132079687

Descriptive statistics

Standard deviation0.02958606156
Coefficient of variation (CV)11.64958182
Kurtosis2.541712089
Mean0.002539667261
Median Absolute Deviation (MAD)0.01578555817
Skewness-0.08993790357
Sum1.803163755
Variance0.0008753350384
MonotonicityNot monotonic
2022-10-11T00:47:52.089431image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.0084640784127
 
1.0%
0.011632084234
 
0.6%
0.0014988761244
 
0.6%
0.017348638334
 
0.6%
-0.00020002000274
 
0.6%
-0.0013008457334
 
0.6%
0.0042907814174
 
0.6%
0.011829751754
 
0.6%
0.022446188834
 
0.6%
-0.027988036543
 
0.4%
Other values (488)668
94.1%
ValueCountFrequency (%)
-0.15047411341
0.1%
-0.11979756351
0.1%
-0.10391711341
0.1%
-0.10203272561
0.1%
-0.088066478871
0.1%
-0.086975014011
0.1%
-0.086865933021
0.1%
-0.08142699871
0.1%
-0.079692768911
0.1%
-0.07203320291
0.1%
ValueCountFrequency (%)
0.12000279241
0.1%
0.11760524211
0.1%
0.11618175431
0.1%
0.082777426461
0.1%
0.080750149691
0.1%
0.079734968021
0.1%
0.079550278761
0.1%
0.078718754711
0.1%
0.078533877651
0.1%
0.073529233291
0.1%

RMW
Real number (ℝ)

HIGH CORRELATION

Distinct446
Distinct (%)62.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.002474803022
Minimum-0.2073932412
Maximum0.1230137759
Zeros1
Zeros (%)0.1%
Negative309
Negative (%)43.5%
Memory size5.7 KiB
2022-10-11T00:47:52.140592image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.2073932412
5-th percentile-0.02786984355
Q1-0.007906172524
median0.0023971246
Q30.01299025912
95-th percentile0.0341211896
Maximum0.1230137759
Range0.3304070171
Interquartile range (IQR)0.02089643165

Descriptive statistics

Standard deviation0.02224253239
Coefficient of variation (CV)8.98759707
Kurtosis14.42674042
Mean0.002474803022
Median Absolute Deviation (MAD)0.01052460425
Skewness-0.7885566537
Sum1.757110146
Variance0.0004947302471
MonotonicityNot monotonic
2022-10-11T00:47:52.187955image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.002995508987
 
1.0%
0.012916225275
 
0.7%
0.0026963615485
 
0.7%
0.013014937085
 
0.7%
-0.0068232253485
 
0.7%
0.020194707294
 
0.6%
0.00079968017064
 
0.6%
-0.013794711024
 
0.6%
0.0092570212634
 
0.6%
-0.0042088447744
 
0.6%
Other values (436)663
93.4%
ValueCountFrequency (%)
-0.20739324121
0.1%
-0.09662103861
0.1%
-0.086865933021
0.1%
-0.079043207341
0.1%
-0.073216062331
0.1%
-0.065178726021
0.1%
-0.049190244191
0.1%
-0.048140375332
0.3%
-0.047301273121
0.1%
-0.045415863531
0.1%
ValueCountFrequency (%)
0.12301377591
0.1%
0.11172024961
0.1%
0.091667188531
0.1%
0.087186361681
0.1%
0.077516442431
0.1%
0.073807927141
0.1%
0.071576198491
0.1%
0.069712612411
0.1%
0.062599141761
0.1%
0.061001021561
0.1%

CMA
Real number (ℝ)

HIGH CORRELATION

Distinct443
Distinct (%)62.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.002633402886
Minimum-0.07192573957
Maximum0.08663630666
Zeros1
Zeros (%)0.1%
Negative329
Negative (%)46.3%
Memory size5.7 KiB
2022-10-11T00:47:52.238087image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum-0.07192573957
5-th percentile-0.02691395425
Q1-0.01005033585
median0.000949547788
Q30.01479008547
95-th percentile0.03614868703
Maximum0.08663630666
Range0.1585620462
Interquartile range (IQR)0.02484042133

Descriptive statistics

Standard deviation0.02029502409
Coefficient of variation (CV)7.706767619
Kurtosis1.369297904
Mean0.002633402886
Median Absolute Deviation (MAD)0.01256676841
Skewness0.2012300489
Sum1.869716049
Variance0.0004118880029
MonotonicityNot monotonic
2022-10-11T00:47:52.284944image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.0034057931356
 
0.8%
0.00089959524285
 
0.7%
0.0089597413715
 
0.7%
-0.00040008002135
 
0.7%
0.0083649163325
 
0.7%
-0.012072581234
 
0.6%
0.0045894523344
 
0.6%
-0.016129381934
 
0.6%
-0.0033054570094
 
0.6%
-0.0095454128444
 
0.6%
Other values (433)664
93.5%
ValueCountFrequency (%)
-0.071925739571
0.1%
-0.070100627681
0.1%
-0.068492996451
0.1%
-0.060068526471
0.1%
-0.058264908131
0.1%
-0.057946959951
0.1%
-0.051293294391
0.1%
-0.048560190621
0.1%
-0.048140375331
0.1%
-0.046462874411
0.1%
ValueCountFrequency (%)
0.086636306661
0.1%
0.080565647841
0.1%
0.074272244371
0.1%
0.06353802081
0.1%
0.062599141761
0.1%
0.060248080351
0.1%
0.05751390621
0.1%
0.057419490871
0.1%
0.057230633451
0.1%
0.054961558071
0.1%

RF
Real number (ℝ≥0)

ZEROS

Distinct106
Distinct (%)14.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.003616216582
Minimum0
Maximum0.01340968691
Zeros69
Zeros (%)9.7%
Negative0
Negative (%)0.0%
Memory size5.7 KiB
2022-10-11T00:47:52.336084image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.001399020914
median0.003792798239
Q30.005087039049
95-th percentile0.008067371078
Maximum0.01340968691
Range0.01340968691
Interquartile range (IQR)0.003688018135

Descriptive statistics

Standard deviation0.002670242841
Coefficient of variation (CV)0.7384078858
Kurtosis0.6088636049
Mean0.003616216582
Median Absolute Deviation (MAD)0.001743290106
Skewness0.6509558854
Sum2.567513773
Variance7.130196828 × 10-6
MonotonicityNot monotonic
2022-10-11T00:47:52.388704image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
069
 
9.7%
9.999500033 × 10-544
 
6.2%
0.00429078141721
 
3.0%
0.0039920212721
 
3.0%
0.00419120461818
 
2.5%
0.00458945233418
 
2.5%
0.00389241471516
 
2.3%
0.00309520490716
 
2.3%
0.00439034830116
 
2.3%
0.00369317183815
 
2.1%
Other values (96)456
64.2%
ValueCountFrequency (%)
069
9.7%
9.999500033 × 10-544
6.2%
0.00019998000278
 
1.1%
0.0002999550094
 
0.6%
0.00039992002132
 
0.3%
0.00049987504171
 
0.1%
0.0005998200725
 
0.7%
0.00069975511436
 
0.8%
0.00079968017067
 
1.0%
0.00089959524287
 
1.0%
ValueCountFrequency (%)
0.013409686911
 
0.1%
0.013014937081
 
0.1%
0.012718772411
 
0.1%
0.012521280551
 
0.1%
0.012323749692
0.3%
0.012027380213
0.4%
0.011434377631
 
0.1%
0.011236631931
 
0.1%
0.010742096531
 
0.1%
0.01064316012
0.3%

Best
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
Mkt_RF
251 
SMB
127 
HML
124 
RMW
122 
CMA
77 

Length

Max length6
Median length3
Mean length4.047887324
Min length2

Characters and Unicode

Total characters2874
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRMW
2nd rowMkt_RF
3rd rowCMA
4th rowRMW
5th rowCMA

Common Values

ValueCountFrequency (%)
Mkt_RF251
35.4%
SMB127
17.9%
HML124
17.5%
RMW122
17.2%
CMA77
 
10.8%
RF9
 
1.3%

Length

2022-10-11T00:47:52.435989image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T00:47:52.482112image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
mkt_rf251
35.4%
smb127
17.9%
hml124
17.5%
rmw122
17.2%
cma77
 
10.8%
rf9
 
1.3%

Most occurring characters

ValueCountFrequency (%)
M701
24.4%
R382
13.3%
F260
 
9.0%
k251
 
8.7%
t251
 
8.7%
_251
 
8.7%
S127
 
4.4%
B127
 
4.4%
H124
 
4.3%
L124
 
4.3%
Other values (3)276
 
9.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2121
73.8%
Lowercase Letter502
 
17.5%
Connector Punctuation251
 
8.7%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M701
33.1%
R382
18.0%
F260
 
12.3%
S127
 
6.0%
B127
 
6.0%
H124
 
5.8%
L124
 
5.8%
W122
 
5.8%
C77
 
3.6%
A77
 
3.6%
Lowercase Letter
ValueCountFrequency (%)
k251
50.0%
t251
50.0%
Connector Punctuation
ValueCountFrequency (%)
_251
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2623
91.3%
Common251
 
8.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
M701
26.7%
R382
14.6%
F260
 
9.9%
k251
 
9.6%
t251
 
9.6%
S127
 
4.8%
B127
 
4.8%
H124
 
4.7%
L124
 
4.7%
W122
 
4.7%
Other values (2)154
 
5.9%
Common
ValueCountFrequency (%)
_251
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2874
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M701
24.4%
R382
13.3%
F260
 
9.0%
k251
 
8.7%
t251
 
8.7%
_251
 
8.7%
S127
 
4.4%
B127
 
4.4%
H124
 
4.3%
L124
 
4.3%
Other values (3)276
 
9.6%

Worst
Categorical

HIGH CORRELATION

Distinct6
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size5.7 KiB
Mkt_RF
194 
SMB
142 
HML
141 
RMW
131 
CMA
91 

Length

Max length6
Median length3
Mean length3.804225352
Min length2

Characters and Unicode

Total characters2701
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCMA
2nd rowSMB
3rd rowMkt_RF
4th rowCMA
5th rowSMB

Common Values

ValueCountFrequency (%)
Mkt_RF194
27.3%
SMB142
20.0%
HML141
19.9%
RMW131
18.5%
CMA91
12.8%
RF11
 
1.5%

Length

2022-10-11T00:47:52.527366image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-10-11T00:47:52.574737image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
ValueCountFrequency (%)
mkt_rf194
27.3%
smb142
20.0%
hml141
19.9%
rmw131
18.5%
cma91
12.8%
rf11
 
1.5%

Most occurring characters

ValueCountFrequency (%)
M699
25.9%
R336
12.4%
F205
 
7.6%
k194
 
7.2%
t194
 
7.2%
_194
 
7.2%
S142
 
5.3%
B142
 
5.3%
H141
 
5.2%
L141
 
5.2%
Other values (3)313
11.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter2119
78.5%
Lowercase Letter388
 
14.4%
Connector Punctuation194
 
7.2%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M699
33.0%
R336
15.9%
F205
 
9.7%
S142
 
6.7%
B142
 
6.7%
H141
 
6.7%
L141
 
6.7%
W131
 
6.2%
C91
 
4.3%
A91
 
4.3%
Lowercase Letter
ValueCountFrequency (%)
k194
50.0%
t194
50.0%
Connector Punctuation
ValueCountFrequency (%)
_194
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin2507
92.8%
Common194
 
7.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
M699
27.9%
R336
13.4%
F205
 
8.2%
k194
 
7.7%
t194
 
7.7%
S142
 
5.7%
B142
 
5.7%
H141
 
5.6%
L141
 
5.6%
W131
 
5.2%
Other values (2)182
 
7.3%
Common
ValueCountFrequency (%)
_194
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2701
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M699
25.9%
R336
12.4%
F205
 
7.6%
k194
 
7.2%
t194
 
7.2%
_194
 
7.2%
S142
 
5.3%
B142
 
5.3%
H141
 
5.2%
L141
 
5.2%
Other values (3)313
11.6%

Interactions

2022-10-11T00:47:51.119474image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:49.765554image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.024678image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.298360image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.568310image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.836698image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:51.162663image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:49.806554image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.066940image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.338645image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.608594image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.879526image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:51.209475image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:49.850485image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.113835image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.384613image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.654304image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.925475image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:51.255313image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:49.892543image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.157444image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.429902image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.698555image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.972072image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:51.301448image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:49.935862image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.203221image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.476551image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.743564image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:51.021699image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:51.348975image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:49.980719image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.250897image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.522175image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:50.790149image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
2022-10-11T00:47:51.070989image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Correlations

2022-10-11T00:47:52.613173image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-10-11T00:47:52.660452image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-10-11T00:47:52.706465image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-10-11T00:47:52.750293image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2022-10-11T00:47:52.789883image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-10-11T00:47:51.611013image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
A simple visualization of nullity by column.
2022-10-11T00:47:51.679219image/svg+xmlMatplotlib v3.5.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

DateMkt_RFSMBHMLRMWCMARFBestWorst
01963-07-01-0.003908-0.004108-0.0097470.006777-0.0118700.002696RMWCMA
11963-08-010.049457-0.0080320.0178400.003594-0.0035060.002497Mkt_RFSMB
21963-09-01-0.015825-0.0052140.001299-0.0071250.0028960.002696CMAMkt_RF
31963-10-010.024985-0.013998-0.0010010.027615-0.0203050.002896RMWCMA
41963-11-01-0.008536-0.0088390.017349-0.0051130.0221530.002696CMASMB
51963-12-010.018135-0.021224-0.0002000.000300-0.0007000.002896Mkt_RFSMB
61964-01-010.0221530.0012990.0146920.0016990.0145930.002996Mkt_RFSMB
71964-02-010.0152830.0027960.027712-0.0005000.0090590.002597HMLRMW
81964-03-010.0140020.0122250.033435-0.0223480.0316920.003095HMLRMW
91964-04-010.001000-0.015317-0.006723-0.012781-0.0108590.002896RFSMB

Last rows

DateMkt_RFSMBHMLRMWCMARFBestWorst
7002021-11-01-0.015621-0.017757-0.0044100.0697130.0172500.000000RMWSMB
7012021-12-010.030529-0.0077300.0322740.0480280.0433470.000100RMWSMB
7022022-01-01-0.064539-0.0413430.1200030.0086620.0742720.000000HMLMkt_RF
7032022-02-01-0.0231660.0291700.029947-0.0210190.0308200.000000CMAMkt_RF
7042022-03-010.030044-0.021734-0.018164-0.0157230.0312080.000100CMASMB
7052022-04-01-0.099378-0.0040080.0600600.0356570.0575140.000100HMLMkt_RF
7062022-05-01-0.003406-0.0006000.0807500.0142970.0390280.000300HMLMkt_RF
7072022-06-01-0.0880660.012916-0.0615560.018331-0.0481400.000600RMWMkt_RF
7082022-07-010.0913930.018527-0.0418640.006777-0.0719260.000800Mkt_RFCMA
7092022-08-01-0.0385330.0149870.003095-0.0491900.0130150.001898SMBRMW